Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation

نویسندگان

Mainak Chaudhuri

Mark Heinrich

Chris Holt

Jaswinder Pal Singh

Edward Rothberg

John L. Hennessy

چکیده

While the desire to use commodity parts in the communication architecture of a DSM multiprocessor offers advantages in cost and design time, the impact on application performance is unclear. We study this performance impact through detailed simulation, analytical modeling, and experiments on a flexible DSM prototype, using a range of parallel applications. We adapt the logP model to characterize the communication architectures of DSM machines. The l (network latency) and o (controller occupancy) parameters are the keys to performance in these machines, with the g (node-to-network bandwidth) parameter becoming important only for the fastest controllers. We show that, of all the logP parameters, controller occupancy has the greatest impact on application performance. Of the two contributions of occupancy to performance degradation—the latency it adds and the contention it induces—it is the contention component that governs performance regardless of network latency, showing a quadratic dependence on o. As expected, techniques to reduce the impact of latency make controller occupancy a greater bottleneck. Surprisingly, the performance impact of occupancy is substantial, even for highly-tuned applications and even in the absence of latency hiding techniques. Scaling the problem size is often used as a technique to overcome limitations in communication latency and bandwidth. Through experiments on a DSM prototype, we show that there are important classes of applications for which the performance lost by using higher occupancy controllers cannot be regained easily, if at all, by scaling the problem size.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effects of Latency and Occupancy in Distributed Shared Memory Multiprocessors

Many designers of distributed shared memory (DSM) multiprocessors are proposing the use of commodity parts, not only in the processor and memory subsystem but also in the communication architecture. While the desire to use commodity parts in the communication architecture offers potential advantages in cost and design time, the impact on the performance of applications is unclear. In this paper...

متن کامل

The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors

Distributed shared memory (DSM) machines can be characterized by four parameters, based on a slightly modified version of the logP model. The l (latency) and o (occupancy of the communication controller) parameters are the keys to performance in these machines, and are largely determined by major architectural decisions about the aggressiveness and customization of the node and network. For rec...

متن کامل

Meshes vs. Hypercubes: A case study for Distributed Shared-memory Multiprocessors

Distributed shared-memory multiprocessors (DSM) are gaining acceptance because they are easier to program than multicomputers. Recently proposed DSM use a direct interconnection network to access remote memory locations, making these architectures scalable. Most DSMs implement a cache coherence protocol by hardware. This protocol exchanges data and control messages through the interconnection n...

متن کامل

Efficient ECC-Based Directory Implementations for Scalable Multiprocessors

With increasing chip densities, next-generation microprocessor designs have the opportunity to integrate many of the traditional system-level modules onto the same chip as the processor. This integration changes some of the design trade-offs for how and where to store directory information. One extremely attractive option is to support directory data with virtually no memory space overhead by c...

متن کامل

Predicting Scalability of Parallel Garbage Collectors on Shared Memory Multiprocessors

This paper describes a performance prediction model of parallel mark-sweep garbage collectors (GC) on shared memory multiprocessors. The prediction model takes as inputs the heap snapshot and memory access cost parameters (latency and occupancy), and outputs performance of the parallel marking on any given number of processors. It takes such factors as parallelism (width of the graph), cache mi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IEEE Trans. Computers

دوره 52 شماره

صفحات -

تاریخ انتشار 2003

Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation

نویسندگان

چکیده

منابع مشابه

The Effects of Latency and Occupancy in Distributed Shared Memory Multiprocessors

The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors

Meshes vs. Hypercubes: A case study for Distributed Shared-memory Multiprocessors

Efficient ECC-Based Directory Implementations for Scalable Multiprocessors

Predicting Scalability of Parallel Garbage Collectors on Shared Memory Multiprocessors

عنوان ژورنال:

اشتراک گذاری